1/26/23
Slides modified from datascienceinabox.org
Q: What were those columns in the NC Bike Accident Data we used?
A: Variables are described here
Q: I was confused by wrap vs grid and how should I choose between them.
A: When you want to generate a plot that uses two other variables in the dataset to determine which subset of the data to plot, grid! When you want to use a single variable to facet your data and want to specify how many columns/rows to display, wrap!
Q: When programming in R so far, I often find myself stuck at getting going on a problem and have a different time identifying on where to start. Any tips/advice on how to get past this initial bump in order to start getting through the problem?
A: This is a common struggle! This may sound like an old-person response, but jotting down what you have and what you want (like on actual paper/iPad) can be really helpful. For example, if you have 3 columns and you know you want to have 3 columns at the end, but you want fewer rows, you can draw a picture of this and help yourself realize you need a filter. Of course when there are multiple steps, the drawings become a bit more complex…but also more helpful! The same can be said for data visualization. Drawing out quickly what you want can help you get started.
Q:In what time frame will the lecture survey be available, how many hours after class will the survey be closed
A: It will be open for at least 2h.
Q:For HW1 Q7, I used read_csv and got an error message. I tried read.csv and it worked. Is there any difference between read_csv and read.csv?
A: Hmm…I’d love to take a look to see what error you got. They are similar and often behave the same way. The difference isread.csv()was made before the tidyverse, so it reads your data in as a dataframe.read_csv()is a function that “plays nicely” with the tidyverse and reads the data in as a tibble/data frame. What does that mean practically? It means that typically each one will read the data in and you’ll get the same number of rows and columns. What could differ would be the column names and/or the column types (depending upon the data). All that said,read_csv()is what I’ll recommend in this course…so that’s why I’m curious about the error you got!
Due Dates:
Course Announcements:
In September 2019, YouGov survey asked 1,639 GB adults the following question:
In hindsight, do you think Britain was right/wrong to vote to leave EU?
- Right to leave
- Wrong to leave
- Don’t know
brexit <- tibble(
opinion = c(
rep("Right", 664), rep("Wrong", 787), rep("Don't know", 188)
),
region = c(
rep("london", 63), rep("rest_of_south", 241), rep("midlands_wales", 145), rep("north", 176), rep("scot", 39),
rep("london", 110), rep("rest_of_south", 257), rep("midlands_wales", 152), rep("north", 176), rep("scot", 92),
rep("london", 24), rep("rest_of_south", 49), rep("midlands_wales", 57), rep("north", 48), rep("scot", 10)
)
)Alphabetical is rarely ideal
Long categories can be hard to read
ggplot(brexit, aes(y = opinion, fill = opinion)) +
geom_bar() +
facet_wrap(~region, nrow = 1) +
guides(fill = "none") +
labs(
title = "Was Britain right/wrong to vote to leave EU?",
subtitle = "YouGov Survey Results, 2-3 September 2019",
caption = "Source: https://d25d2506sfb94s.cloudfront.net/cumulus_uploads/document/x0msmggx08/YouGov%20-%20Brexit%20and%202019%20election.pdf",
x = NULL, y = NULL
)ggplot(brexit, aes(y = opinion, fill = opinion)) +
geom_bar() +
facet_wrap(~region,
nrow = 1,
labeller = label_wrap_gen(width = 12)
) +
guides(fill = "none") +
labs(
title = "Was Britain right/wrong to vote to leave EU?",
subtitle = "YouGov Survey Results, 2-3 September 2019",
caption = "Source: bit.ly/2lCJZVg",
x = NULL, y = NULL
)ggplot(brexit, aes(y = opinion, fill = opinion)) +
geom_bar() +
facet_wrap(~region, nrow = 1, labeller = label_wrap_gen(width = 12)) +
guides(fill = "none") +
labs(title = "Was Britain right/wrong to vote to leave EU?",
subtitle = "YouGov Survey Results, 2-3 September 2019",
caption = "Source: bit.ly/2lCJZVg",
x = NULL, y = NULL) +
scale_fill_manual(values = c(
"Wrong" = "red",
"Right" = "green",
"Don't know" = "gray"
)) ggplot(brexit, aes(y = opinion, fill = opinion)) +
geom_bar() +
facet_wrap(~region, nrow = 1, labeller = label_wrap_gen(width = 12)) +
guides(fill = "none") +
labs(title = "Was Britain right/wrong to vote to leave EU?",
subtitle = "YouGov Survey Results, 2-3 September 2019",
caption = "Source: bit.ly/2lCJZVg",
x = NULL, y = NULL) +
scale_fill_manual(values = c(
"Wrong" = "#ef8a62",
"Right" = "#67a9cf",
"Don't know" = "gray"
))ggplot(brexit, aes(y = opinion, fill = opinion)) +
geom_bar() +
facet_wrap(~region, nrow = 1, labeller = label_wrap_gen(width = 12)) +
guides(fill = "none") +
labs(title = "Was Britain right/wrong to vote to leave EU?",
subtitle = "YouGov Survey Results, 2-3 September 2019",
caption = "Source: bit.ly/2lCJZVg",
x = NULL, y = NULL) +
scale_fill_manual(values = c("Wrong" = "#ef8a62",
"Right" = "#67a9cf",
"Don't know" = "gray")) +
theme_minimal() ggplot(brexit, aes(y = opinion, fill = opinion)) +
geom_bar() +
facet_wrap(~region, nrow = 1, labeller = label_wrap_gen(width = 12)) +
guides(fill = "none") +
labs(title = "Was Britain right/wrong to vote to leave EU?",
subtitle = "YouGov Survey Results, 2-3 September 2019",
caption = "Source: bit.ly/2lCJZVg",
x = NULL, y = NULL) +
scale_fill_manual(values = c("Wrong" = "#ef8a62",
"Right" = "#67a9cf",
"Don't know" = "gray")) +
theme_minimal(base_size = 16) +
theme(plot.title.position = "plot",
panel.grid.major.y = element_blank()) ggplot2?